Additional clarification for override_base_seq_len #75
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Is your pull request related to a problem? Please describe.
In discussion with multiple users in support threads, new users frequently incorrectly use override_base_seq_len instead of max_seq_len to configure context length. This will cause problems if they attempt to use the built in automatic NTK rope scaling for context extension.
Also the documentation incorrectly specifies Mixtral as an example model with an incorrect base sequence length, when this only applies to Mistral 7B with sliding window attention inactive in exl2.
Why should this feature be added?
It will lead to fewer users using incorrect configurations.
Examples
None
Additional context
None